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ABSTRACT 

We describe a scheme to extract linearly supporting (LSU) features from stellar spectra to 
automatically estimate the atmospheric parameters Tgff, log g, and [Fe/H]. “Linearly support¬ 
ing” means that the atmospheric parameters can be accurately estimated from the extracted 
features through a linear model. The successive steps of the process are as follow: first, decom¬ 
pose the spectrum using a wavelet packet (WP) and represent it by the derived decomposition 
coefficients; second, detect representative spectral features from the decomposition coefficients 
using the proposed method Least Absolute Shrinkage and Selection Operator (LARS)i,s; third, 
estimate the atmospheric parameters Tgff, log g, and [Fe/H] from the detected features using a 
linear regression method. One prominent characteristic of this scheme is its ability to evaluate 
quantitatively the contribution of each detected feature to the atmospheric parameter estimate 
and also to trace back the physical significance of that feature. This work also shows that the 
usefulness of a component depends on both wavelength and frequency. The proposed scheme 
has been evaluated on both real spectra from the Sloan Digital Sky Survey (SDSS)/SEGUE and 
synthetic spectra calculated from Kurucz’s NEWODF models. On real spectra, we extracted 
23 features to estimate Teff, 62 features for log g, and 68 features for [Fe/Hj. Test consistencies 
between our estimates and those provided by the Spectroscopic Sarameter Pipeline of SDSS show 
that the mean absolute errors (MAEs) are 0.0062 dex for log Tgff (83 K for Teff)) 0.2345 dex for 
log g, and 0.1564 dex for [Ee/Hj. For the synthetic spectra, the MAE test accuracies are 0.0022 
dex for log Tgff (32 K for Tgfj), 0.0337 dex for log g, and 0.0268 dex for [Fe/Hj. 

Subject headings: stars: atmospheres - stars: fundamental parameters - methods: statistical - methods: 
data analysis - stars: abundances 


INTRODUCTION 


Large-scale, deep sky survey progra ms, such as 
the Sloan Digital Sk y Survey (SDSS; [York et ^ 


data makes it necessary to use a fully automated 
process to characterize the spectra, which in turn 
will enable statistical exploration of the atmospheric 

I- 1 I-^ ' ' - parameter-related properties in the spectra. 

I 2 OOOI: lAhn et al. 2012ll. the Large Skv Area Multi- 

, . “ 3 ; - - —- ' .mi /X Axxxxnmxixx , J-his puper luvestigutes the problem of repre- 

obiect hiber Spectroscopic ielescope (LAMOSi )/Guoshouiing ^ ■, j 1 r 

to™ sa Li 

^ ^^ J |' '!\ —— 7-|Siffnmcant leatures to estimate atmospheric par- 

Gaza-ESO Survey (Gilmore et al. ll2012l:IRandich et al. I , ... ,, 

——T ^ - ; ——— ; —;■-- 'ameters. Ihe spectrum representation problem 

zUlol). are collectine and will obtain verv laree num- . . .. , . , 

° ® „ IS a vital procedure in the aforementioned tasks 

bers of stelfar spectra. Ihis enormous wealth of ^ r j ^ c ^ ^ 

and IS usually referred to as feature extraction 
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in data mining and pattern recognition. For 
example, in atmospheric parameter estimation, 
a spectrum can be represented by the full ob¬ 


serve d spectrum (IBailer-Jones 120001: IShkedv et al 


20071), the corrected spectrum (jAllende Prieto et al 
2006r). the description of some critical spectr al lines 


(Mishenina et al.ll200fi: Muirhead et al. 12012b a sta¬ 


tistical description (|Re Fiorentin et al.l 120071) . etc. 
In the present paper, we will describe a scheme for 
extracting LSU (linearly supporting) features from 
stellar spectra to estimate atmospheric parameters. 

“Linearly supporting” means that the atmo¬ 
spheric parameters should be accurately estimated 
from the extracted features using a linear model. 
Such a model helps to evaluate the contribution of 
each feature to the atmospheric parameter estimate 
and also to trace back the physical interpretation of 
that feature. It is known that there exists a high 
nonlinearity in the dependency of the three basic 
atmospheric parameters Teff, log g, and [Fe/H] on 


the s tellar spectra (Tables 6, 10 and 11 in iLi et al. 


2014) . Therefore, we will first perform a nonlin¬ 


ear transformation on the spectrum before detecting 
LSU features. In this work, this initial transforma¬ 
tion is performed using a wavelet packet (WP). The 
time-frequency localization of the WP allow us to 
isolate potential unwanted influence from noise and 
redundancy, and also help us to backtrack the phys¬ 
ical absorptions or emissions that contribute to a 
specific analysis result. This work also shows that 
the effectiveness of a component depends on both 
wavelength and frequency. 

Based on the WP decomposition of a spectrum 
and the Least Absolute Sh rinkage and Sele ction 
Operator (LASSO) method ( Tibshiranil 1996h . we 
propose an algorithm, LASSO(LARS)bs, to explore 
a parsimonious representation of the parameteriza¬ 
tion model. Using LASSO(LARS)bs, we extracted 
23 features to estimate Teff, 62 features for log g, 
and 68 features for [Fe/H]. Experiments (Section 
B on real spectra from SDSS and synthetic spec¬ 
tra show the effectiveness of the detected features 
through the application of two typical linear re¬ 
gression methods: Ordinary Least Square (OLS) 
and Su pport Vector Regression with a linear k ernel 


and su pport Vector Kegression with a linear k e 
fSVR>: Ischokopf et al.l[2005 ISmola et ahl^OOdll . 


The proposed scheme is a type of statistical learn¬ 
ing method. The fundamental suppositions are that 
(1) two stars with different atmospheric parame¬ 
ters have distinct spectra, and (2) there is a set of 


observed stellar spectra or synthetic spectra with 
known atmospheric parameters, referred to as a 
training set in machine learning and data mining. 
Apart from the two above suppositions, there are no 
other a priori physical assumption. The first suppo¬ 
sition states that there exists a mapping from stel¬ 
lar spectra to their atmospheric parameters. Based 
on these two suppositions, the proposed scheme can 
automatically discover this mapping, which is also 
known as the spectral parameterization model in as¬ 
tronomical data analysis, using several proposed pro¬ 
cedures. 

This paper is organized as follows. Section [5] de¬ 
scribes the stellar spectra used in this study. In 
Section [31 a proposed stellar parameter estimation 
model is introduced. Section |4] presents the overall 
configuration of the proposed scheme and investi¬ 
gates the feature recombination of a spectrum based 
on the WP transform. Section [5] reports some exper¬ 
imental evaluations. Section [B] discusses some tech¬ 
nical problems, such as the optimal configuration for 
the WP decomposition, the sufficiency and compact¬ 
ness of the detected features, and the advantages and 
disadvantages of redundancy. Finally, we summarize 
our work in Section [71 

2. DATA SETS 

The scheme proposed in Sections |3]|6] below 
has been evaluated on both real spectra from 
SDSS/SEGUE and synthetic spectra calculated from 
Kurucz’s NEWODF models. Real data usually 
present some disturbances arising from noise and 
pre-processing imperfections (e.g. sky lines and/or 
cosmic ray removal residuals, residual calibration 
defects), which are not present in synthetic spectra. 
These disturbances must be acceptable for the at¬ 
mospheric parameter estimation process. Synthetic 
spectra are built from ground-truth parameters as 
reference. 

Our scheme belongs to the class of statistical 
learning methods. The fundamental idea is to dis¬ 
cover the linearly predictive relationship between 
stellar spectra and the atmospheric parameters Teff, 
log g, and [Fe/H] from empirical data, which con¬ 
stitutes a training set. At the same time, the per¬ 
formance of the discovered predictive relationships 
should also be evaluated objectively. Therefore, a 
separate, independent set of stellar spectra is needed 
for this evaluation, usually referred to as a test set in 
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machine learning. However, most learning methods 
tend to overfit the empirical data. In other words, 
statistical learning methods can unravel some of the 
alleged relationships from the training data that do 
not hold in general. In order to avoid overfitting, we 
require a third independent set of spectra to optimize 
the parameters which need to be adjusted objectively 
when investigating the potential relationships: this 
third spectra set along with their reference parame¬ 
ters constitute the validation set. 

Therefore, in each experiment, we will split the 
total spectra samples into three subsets: the train¬ 
ing set, validation set, and test set. The training 
set is the carrier of knowledge and the proposed 
scheme should learn from this training set. The 
validation set is the mentor/instructor of the pro¬ 
posed scheme which can independently and objec¬ 
tively provide some advice in the learning process. 
The training set and validation set are used to es¬ 
tablish a model, while the test set acts as a referee 
to objectively evaluate the performance of the es¬ 
tablished model. The roles of the three subsets are 
listed in Table [H 


2.1. Real Spectra from SDSS/SEGUE 


In this work, we use 50, 000 real spectra from 
the SDSS/SEGUE database ( Abazaiian et al. 20091 : 


Yannv et al. 2009ll . The selected spectra span 


the ranges [4088,9740] K in effective temperature 
Teff, [1.015, 4.998] dex in surface gravity log g, 
and [-3.497, 0.268] dex in metallicity [Fe/H], as 


eter Pipeline (SSPP: iBeers et al. 

20061: iLee et al. 

2008al bl: Allende Prieto et al.l 20081 

Smolinski et al. 

2011:lLee et al.l2011). All stellar soectra are initiallv 


shifted to their rest frames (zero radial velocity) us¬ 
ing the radial velocity provided by SSPP. They are 
also rebinned to a maximal common log(wavelength) 
range ^.581862, 3.963961] with a sampling step of 
O.OOOljj The sizes of the training set, validation set, 
and test set are 10,000, 10,000 and 30,000 spectra, 
respectively. 


We take the real spectra atmospheric parame¬ 
ters previously estimated by SSPP as reference val¬ 
ues. The SSPP estimation is based on both stel¬ 
lar spectra and ugriz photometry by combining the 
results of multiple techniques to alleviate the limi- 


^The common wavelength range is approximately [3818.23, 
9203.67]A. 


tations of a specific method, see Lee et al] ( 2008all 
and references therein. SSPP has been extensively 
validated by comparing its estimates with the sets 
of parameters obtained from high-resolution spec¬ 


tra fr om SDSS-I/SEGUE stars (jAllende Prieto et al 


20081) and with the available information from the lit¬ 


erat ure for stars in Galactic open and globu lar clus¬ 
ters ( Lee et al. 2008b; Smolinski et al. 2011 ). 


2.2. Synthetic Spectra 

A set of 18,969 synthetic spectra ar e calculated 
from the SPEGTRUM (v2.76) package jlGra^_et_al 


1994) with Kurucz’s NEWODF models ([Gastelli et al 
20031) . When generating the synthetic spectra, 
830,828 atomic and molecular lines are used (con¬ 
tained in two files luke.lst and luke.nir.lst); the 
atomic and molecular data are stored in the file 
stdatom.dat, which include s solar atomic abun¬ 
dances from iGrevesse et al.l (| 19981) . The SPEG¬ 
TRUM package and the three data files can be down¬ 
loaded from websiteH 


Our grids of synthetic stellar spectra span the pa¬ 
rameter ranges [4000,9750] K in Teff (45 values, step 
sizes of lOOK between 4000 and 7500 and 250 K be¬ 
tween 7750 and 9750K), [1, 5] dex in log g (17 values, 
step size of 0.25 dex), and [-3.6, 0.3] dex in [Fe/H] 
(27 values, step size of 0.2 dex between -3.6 and -1 
dex 0.1 dex between -1 and 0.3 dex). The synthetic 
stellar spectra are also split into three subsets: the 
training set, validation set, and test set with respec¬ 
tive sizes of 8500, 1969 and 8500 spectra. 


3. A LINEAR ESTIMATION MODEL FOR 
ATMOSPHERIC PARAMETERS 

3.1. Model 

Let a vector x = {xi, - ■ ■ ,Xp)’^ represent a spec¬ 
trum and y be an atmospheric parameter to be es¬ 
timated, where p > 0. The component Xj repre¬ 
sents the flux of the spectrum x, j G {1,2, •• • ,p}. 
We investigate the atmospheric parameter estima¬ 
tion problem based on a linear model: 

p 

y = f[x-w) = ^w^Xj, (1) 

j=i 

where w = {wi, • • • , Wp) are free parameters charac¬ 
terizing the model. For convenience, we assume that 
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Table 1: Roles of the Three Data Sets. 


Data Sets 

Roles 

'itaining Set 

To be used in 


(1) Detecting features by LASSO(LARS)bs; 

(2) Estimating the parameterizing model OLS, SVR^. 

Validation Set 

To be used in 


(1) Determining the configuration of wavelet packet decomposition (Section 

leal: 

(2) Determining the parameters in SVRj. 

Test Set 

To be used in performance evaluation (Sections fsl and I6.2t. 


Note. SVRj: support vector regression with a linear kernel. 


the mean of the variable y to be estimated is zero, 
otherwise a wq should be added to the right side of 
Equation Cl¬ 


in this work, y can be the effective temperature 
Teff, the surface gravity log g, or the metallicity 
[Fe/H]. The stellar spectra are analyzed three times, 
respectively, for these three parameters. To reduce 
the dynamical range and to better represent the 
uncertainties of the spectral dat a, we use log Tpff 
inste ad of Teff in our analysis (IRe Fiorentin et al 


20071). 


Under the linear regression model in Equation 
o, it is easy to evaluate the influence from a 
flux component Xj on the estimate y: a regres¬ 
sion coefficient Wj provides the variation of the pa¬ 
rameter y to be estimated when the component Xj 
is changed by one unit while the other flux com¬ 
ponents {xi, • • • , Xj-i,Xj+i, • • • , Xp} are kept con¬ 
stant. Therefore, the model in Equation ([T|) de¬ 
scribes the linear support for the parameter to be 
estimated from every component of a spectrum. 


Suppose that Sp is a set consisting of the 
flux/predictor components of stellar spectra whose 
model coefficients Wj ^ 0 in Equation ([T|), and Sp is 
a set consisting of the flux components whose model 
coefficients wj = 0. Then, all of the components 
belonging to S';- are ineffective in model o, and 
S'_F is the set of components necessary and suffici¬ 
ent for estimating y based on the linear model (HD. 
Therefore, the components in Sp are called a set of 
LSU features for the parameter to be estimated in 
Equation (HD- 


3.2. Model Selection 

The model in Equation ([T]) can be determined by 
checking its consistency with a set of labeled spectra 

S = {ix\y,),t=l,2,--- ,N}, (2) 

where is a spectrum and yi is an atmospheric pa¬ 
rameter. The consistency is usually evaluated using 
the Mean of Squared Error (MSE): 

1 ^ 

MSE(w) =— ^ {yi - inf 

(3) 

When we select the model in Equation (HD by min¬ 
imizing the MSE error 

w = arg min {MSE{w)}, (4) 

W 

the model /(•; w) derived by Equation (jH) is referred 
to as the OLS regression. In this OLS model, most 
of the coefficients wi,W 2 T--Wn are non-zero and we 
will call it a complex model for convenience. This 
complexity usually leads model HD to suffer from 
redundancy and irrelevant variables in the data (as 
noise or pre-processing artefacts), which in turn can 
lead to overfitting and difficulties in exploring the 
most significant factors in high-dimensional spectra. 

To overcome or alleviate the aforementioned limi¬ 
tations, a typical strategy is to regularize the object 
function (|4]) by the £i-norm of the model parameter 
w 

N 

w = argminjV {yt - f{x’’\w)f + AH-mHi}, (5) 

■W ' ^ 
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where 


Hill = 

i=l 


The model f{-;w) derived from Equation ([5]) is 
called LASSO (L east Absolute Sh rinkage and Sel¬ 
ection Operator) ( Tibshirani 1996ll . Here, A > 0 is a 
tuning parameter that controls the amount of non¬ 
zero parameters Wi, or equivalently the complexity of 
the selected model. Studies show that LASSO can ef¬ 
fectively filter out most of the redundant or irrelevant 
varia bles by shrinking som e parameters Wi to exactly 
zero ( James et al. 201^ . We use the Matlab im- 
plementation (jSiostrand 2005ll of LASS O based on 
the LARS algorithm ( Efron et al. 2004 ). To high¬ 
light the implementation based on LARS, we label it 
LASSO(LARS). In LASSO(LARS), the parameter A 
can be equivalently repl aced with the num ber m of 
non-zero parameters Wi ( Efron et al. I 20o3) . 


Selecting features using LASSO is equivalent to 
determining the subset of model coefficients {wj,j = 
1, • • • ,p} with non-zero values. Suppose that Sw 
represents the subset of model coefficients Wj in 
equations m and O- Essentially, LARS is an im¬ 
plementation of LASSO based on a forward select¬ 
ion scheme. It starts with all coefficients equal to 
zero {wj = 0,j = I,-- - ,p} by setting yo = 0 El 
and Sw = 0) where 0 is the empty set. It then 
expands Sw gradually as follows. First, the LARS 
algorithm tries to find the predictor Xj^ best cor¬ 
related with the response y and expands Sw from 
the empty set to by setting the value of Wj^ 

to move by the largest possible step in the direc¬ 
tion of predictor until some other predictor Xj^ 
has as much correlation with the current estimation 
residuafl. In the next step, LARS stops the mo¬ 
tion along Xj ^, proceeds in an equiangular direction 
between the two predictors Xj^ and Xj^ (least an¬ 
gle direction) by adjusting Wj^ and Wj^ simultane¬ 
ously until a third predictor Xj^ has as much corre¬ 
lation with the current estimate residuaEI, and set¬ 
ting Sw = {wji,'Wj 2 }. Then, LARS proceeds in an 
equiangular direction between xj -^, xj^ and Xj^ (least 
angle direction) until a fourth predictor Xj^ is found, 


®The subscript 0 of yo indicates that this estimate is computed 
without considering any predictor. 

^Now, Xj^ and Xj^ are tied for the highest correlation with the 
current estimate residual. 

® Currently, , Xj ^, and Xj^ are tied for the highest correlation 

with the current estimate residual. 


and Sw = {wj ^, Wj ^, wj^}. The LARS algorithm can 
select m features if the above procedure continues, 
where m is an empirically preset number represent¬ 
ing the number of non-zero parameters Wj. In ter- 
ested readers are referred to lEfron et al. ()2004l) for 
further information concerning LARS. 

Note that aside from LASSO, there are multi¬ 
ple alternatives for sparse model selection, for exam¬ 
ple, Forward Stepwise Selection, Backward Stepwise 


Selection, Forward Stagewise ( Hastie et al. 20091: 


James et al. 

20051 ) etZ 


2 OI 3 I) . Elastic Net ( Zou and Hastie 


3.3. Refining the Selected Model 

To select a model with fco features, we can 
use LASSO (LARS) directly to impose the con¬ 
straint fco on the features number, or first select 
a model with m features using LASSO(LARS), 
and then eliminating the m — kg features itera¬ 
tively one by one, where fcg and m are two pos¬ 
itive integers and m > k^. For convenience, we 
call the above-mentioned schemes, respectively, 
“direct LASSO(LARS)” and “LASSO(LARS)bs” 
(LASSO(LARS) with backward selection). Exper¬ 
iments show that the LASSO(LARS)bs scheme is 
better than the direct LASSO(LARS). 

On the whole, LASSO (LARS) is a forward sel¬ 
ection method. Its drawback is that each addition 
of a new feature may make one or more of the al¬ 
ready included variables not sufficiently significant, 
and even less significant than the excluded vari¬ 
ables. The LASSO(LARS)bs can choose more vari¬ 
ables as candidates and take more combination ef¬ 
fects of variables into consideration. This is a bal¬ 
ance between accuracy and time complexity. The 
proposed LASSO(LARS)bs scheme works as follows. 

1. Select a linear model with m non-zero coefficients 
based on a training set by LASSO (LARS) (see 
Equation ([5]) and Section [321 above); the corre¬ 
sponding variables form a set Sp- 

2. For every element s € Sp, compute two OLS esti¬ 
mates /(•; wf) and /(•; ) based on the variables 

Sp and S'i? —{s}, respectively, from a training set. 

3. Evaluate the effectiveness of s using Eff(s) = 
MAE{f{-]W2)) — MAE{f{-;wl)), where the 
MAE is computed based on a validation set (see 
below Equation ([8]) in Section 13.41 for the defini¬ 
tion of the MAE). 
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4. Compute sq = arg min {Eff(s)}, and let Sp = 

sGSf 

Sf — {so}- 

5. If the size of Sp is greater than ko, go to step 2; 
otherwise, return Sp as the extracted features and 
take the OLS estimate ot Sp as the final model. 


then ME and SD are the estimates of /r and cr, re¬ 
spectively. In addition, MAE is the estimation of 


if the errors {em,m = 1,2,-■■ ,M} are iid 


random variables with a normal density distribution 

2 _ 

1 


</>(e;0,CT) = 


r\/27r 


( Geary 11199411 . 


3.4. Evaluation Methods 


Suppose that Ste = ym),'m = 1,2, • ■ • , M} 

is a test set. In this work, the performance of 
the proposed scheme is evaluated using three meth¬ 
ods: Mean Error (ME), MAE, and Standard De¬ 
viation (SD). They have been used in related re¬ 
search (iRe Fiorentin et ahl l2007t Ijofre et aD 120101: 
Tan et al. 2013[l and are defined as follows: 


1 “ 

m—l 

(7) 

1 “ 

MAE = \e-m\, 

m—l 

(8) 

A 

M 

m—l 

(9) 


where Sm is the error/difference between the refer¬ 
ence value of the stellar parameter and its estimate 


= ym- m = l, (10) 


ME, MAE, and SD are all widely used in the per¬ 
formance evaluation of an estimation process. Each 
evaluation method focuses on different aspects of the 
estimation process. ME measures the average mag¬ 
nitude of the deviation, reflecting systematic errors: 
if the expectation of ME is 0, then /(a;™) is referred 
to as a statistically unbiased estimator of ym- MAE 
accesses the average magnitude of the deviation by 
ignoring the sign/direction of an error. SD shows 
how much variation exists in an estimation error 
and reflects the stability/robustness of the estima¬ 
tion process. A low SD indicates that the perfor¬ 
mance of the proposed estimation scheme is very sta¬ 
ble; a high SD indicates that its performance is sen¬ 
sitive to a specific spectrum to be processed. If the 
errors {em,rn = 1,2, ••• ,M} are independent, iden¬ 
tically distributed (iid) random variables with a nor- 

. _ 

mal density distribution (j){e\yL,cF) = 2 '^^ , 


4. OVERALL CONFIGURATION AND 
SPECTRAL FEATURE ANALYSIS BASED 
ON WP 


4.1. Overall Configuration 


There exists a high nonlinearity in the dependence 
of the atmospheric parameters Tgff, log g, [Fe/H] 
on stellar spec tra (Table 6, Table 10, Table 11 in 
Li et al. 2014ll . Therefore, a nonlinear transforma¬ 


tion should be performed on spectra before detecting 
LSU features to estimate stellar parameters. Several 
statistical procedures will be performed to estimate 
the atmospheric parameters Tgff, log g, and [Fe/H] 
in the proposed scheme. 


A flowchart of the procedures is presented in Fig. 
[Uto demonstrate the end-to-end flow in the analysis. 
The initial step “Decompose spectra by WP trans¬ 
form” is introduced in Section Wj2\ below. This step 
requires that some technical choices be made, such 
as the selection of the wavelet basis function and of 
the level of wavelet packet decomposition (WPD). 
These problems are discussed in Section 16.11 After 
decomposing the stellar spectra, we can detect and 
extract features using the LASSO(LARS)bs method 
(Section 13.311 to reduce redundancy and noise (Sec¬ 
tion |T3l). 


4.2. WP Transform 

We apply the WP transform to our stellar spec¬ 
trum and decompose it into a series of components 
with different wavelengths and different frequencies 
(time-frequency localization). 

Suppose that x = ,Xn)'^ G R" is a 

spectrum consisting of n fluxes (sampling points): 
we refer to it as a signal with length n. Since the 
spectrum considered is a one-dimensional signal, our 
discussion focuses on one-dimensional WPs. 


4-2.1. Principles 

WPs can decompose a signal into a low-frequency 
approximation signal and high-frequency details, 
and can iteratively re-decompose those signals to 
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Fig. 1.— Flowchart to show the order in which the statistical procedures are used in the analysis. 


provide increasingly accurate frequency resolution. 
For example, in Fig. [2l WPs decompose a sig¬ 
nal X into a low-frequency approximation signal 
a;[l, 0 ] = (x\ o,Xi Q, - ■ ■ , x^g) and a high-frequency 
detail signal a;[l, 1] = (x} • ,x"\). We call 

a;[l] = 0], a;[l, 1]} the first-level WP decom¬ 

position of signal x. Then, a;[l,0] can also be fur¬ 
ther decomposed into x[2,0] = (x^.g, X 2 ,o, • • • ,X 2 %) 
and x[ 2 ,l] = (x^ i, x| , x^^), and x[l,l] 

into a:[ 2 , 2 ] = {x 2 , 2 jX 2 ^ 2 j" ' ^^ 2 %) 2 ;[ 2 , 3 ] = 

(x 2 ^ 3 , x| 3 , • • • 1 X 23 ). These four new resulting sig¬ 
nals are called the second-level WP decomposition 
a;[2] = {x[2,j],j = 0,--- ,3} of signal a;. 

If this decomposition procedure is repeated 
again and again, then a series of decompositions, 
a; [3], a; [4], • • •, are generated and form the WP de¬ 
composition tree of the signal x (see Fig. [2]), where 
x[i] = {x[i,j],j = 0, ••• ,2® — 1} G is the zth 
level WP decomposition, where Ni is an integer and 
is described in detail in Section fd. 2. 21 At each level, 

x[i,j] = <k <ni} (11) 

is a set of decomposition components belonging to 
a frequency sub-band, where rii is an integer and is 
described in detail in Section [4.2.21 The frequency of 
a sub-band x[i,ji] is higher than that of a sub-band 
x[i,j 2 ], where i > 1 and 0 < j 2 < ji < 2b Therefore, 
there are 2* frequency sub-bands on the ith level WP 
decomposition, and the (i-l- l)th level WP decompo¬ 
sition has higher frequency resolution than the ith 
level WP decomposition, where i > 0,_) > 0. 

Traditionally, a sub-band x[i,j] is referred to as 


a node of a WPD tree (Fig. [2]), and the component 
xfis referred to as a WP coefficient. 

4--2.2. Implementations 

In this work, we use the WP implementation of 
Wavelet Toolbox in Matlab. WP decomposition is 
implemented by filtering and downsampling, and 
the filter is a vector associated with a basis func¬ 
tion. Suppose that a; is a signal with length n to 
be decomposed by WP in Fig. and the length 
of the filter is m. Then, the length of x[l,j] is 
ni = ceil{n/2) + ceil{m/2) — 1, where j € {0,1} and 
ceiliyz) is a function that rounds up its parameter z 
to the nearest integer toward infinity: 

ceil{z) = k,if k — 1 < z < k, (12) 

where k is an integer. Therefore, the length of the 
first-level WP decomposition a;[l] is A^i = ni x 2^ = 
2{ceil{n/2) + ceil(rn/2) — 1). 

Similarly, if the length of a sub-band x[i,j] is Ui, 
then the length of the zth level WP decomposition 
x[i] is Ni = UiX 2®, where z > 1, 0 < j < 2®, and 2® is 
the number of sub-bands with different frequency at 
the WP decomposition level z; The length of a sub¬ 
band x[i + 1, j] on the (z -I- l)th WP decomposition 
is rzi+i = ceil{ni/2) + ceil{m/2) — 1, and the length 
of the (z -I- l)th level WP decomposition a:[z -|- 1] is 

iV,+i = rz,+i X 2®+b (13) 

We investigate the feature analysis problem of 
WP-decomposed stellar spectra using the following 
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original signal 


X 


1st level 


x[l,0| x|l,ll 



2nd level x[2,0] x[2,l] 


x[2,2] x[2,3| 


Fig. 2.— WP decomposition tree: principles of WP. A signal can be decomposed into a low-frequency 
approximation signal and high-frequency details, and can be iteratively re-decomposed to provide increasingly 
accurate frequency resolution. 


typical basis functions: Biorthogonal basis (bior), 
Coiflets(coif), Daubechies basis (db), Haar ('haar), 
ReverseBior (rbio), and Sy mlets (sym) ( Mallat I 
ll989l[2005lDaubechies Ill992l l. The filters associated 
with these functions are, respectively, referred to as 
filter(bior2.2), filter(coif4), filter(db4), filter(haar), 
filter(rbio4.4), and filterfsym dRsee Documentati on 
of Matlab-wavelet filters: as E M. wfiltei^l2014li^ . 
The respective filter lengths are 6 , 24, 8 , 2, 10, and 8 . 
The length of all our spectra is n = 3821. Based on 
Equation (IT^ . the lengths of the WP decomposition 
{x[i],i = 1, • • • , 6 } are presented in Table [2] for the 
above-mentioned basis functions. 


4-2.3. Reconstruction and Visualization 

As for a level i in the WP decomposition tree (Fig. 
[ 2 ]), WPD is a mapping wpdec : i?" —> from the 

spectral space i?" to a WPD space 


wpdec{x, i) = x[i]^ 


(14) 


G R 


Ni 


where x G i?" is a spectrum and x\i 
Based on th e theory of WP ((Daubechies 
Mallat I l2009ll . we can also reconstruct the spec- 


1992 


trum X from WP decomposition x[i] by a mapping 
wprec : i?" (se e Documentation of Matlab 

WP reconstruction: as D.M. wored 2014 1. this pro¬ 


cess is referred to as WP reconstruction. 

Suppose that jo is an integer satisfying 0 < jo < 
2\ s[f] = {s[i,j]J = 0 , ••• , 2 * - 1 }, where s[f,jo] = 


®There are multiple variants for basis functions bior, coif, db, 
rbio, and sym in the implementation of the Matlab wavelet 
toolbox. The numbers behind them are the indexes of the 
variants. 


x[i,jo] and s[i,j] = 0 0 if j 7 ^ jo and 0 < j < 2\ 
Using WP reconstruction, we can map a to a vector 
wprec{s[i]) G i?" to visualize the frequency sub-band 
x[i,jo] in spectral space (Fig. [3]). This visualizing 
technique is widely used in related research. 


4 . 2 . 4 . Wavelength/Time-frequency Decomposition 


A WPD coefficient can be visualized in the spec¬ 
tral space based on the method in Section 14.2.31 
(Fig. H]). It can be shown that the energy of 
a WPD coefficient exists in a local and limited 
area (Fig. |4]), and a spectrum x can be recon¬ 
structed by the coefhcien te on a d e comp osition level 
x[i] (Section 14.2.31 and iMallat I ( 2009[B . There¬ 
fore, in addition to decomposing a signal-based fre¬ 
quency (Section (4.21 Fig. |3]), WP also implements 
wavelength decomposition. These characteristics 
are called wavelength/time-frequencylocalization or 
wavelength/time-frequency analysis □. 

In this work, the wavelength position of a coe¬ 
fficient of WPD is represented by the center of the 
corresponding non-zero area in spectral space (Fig. 

ED. 


4.3. Feature Selection 

This subsection focuses on selecting a linearly sup¬ 
porting subset of WP components/coefficients to es¬ 
timate the atmospheric parameters Tgff, log g, and 


zero vector sharing the same length with x[i,j]. 

®In the information processing community, a signal is usually 
composed of some detected energy values on a series of time 
points, and thus, semantically, the above-mentioned charac¬ 
teristics of WPs are usually referred to as time-frequency lo¬ 
calization or time-frequency analysis. 
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Table 2: Length of wavelet packet decomposition of a spectrum used in this work based on six typical wavelet 
bases. _ 


LWPD 

filter(bior2.2) filter(coif4) 

filter(db4) 

filter(haar) 

filter(rbio4.4) filter(sym4) 

1 

3826 

3844 

3828 

3822 

3830 

3828 

2 

3836 

3888 

3840 

3824 

3848 

3840 

3 

3856 

3976 

3864 

3824 

3880 

3864 

4 

3888 

4160 

3920 

3824 

3952 

3920 

5 

3968 

4512 

4032 

3840 

4096 

4032 

6 

4096 

5248 

4224 

3840 

4352 

4224 


Note. LWPD: level of wavelet packet decomposition. 



3000 4000 5000 6000 7000 8000 9000 10000 


Fig. 4.— Wavelength/time localization of wavelet packet decomposition. This is the visualization of three 
wavelet packet coefficients, x® g, 2 : 5^3 of the spectrum in Fig. |3(a)| There are three areas with non-zero 
energy (non-zero areas). The three non-zero areas from left to right correspond to a^lg, ^ 5 ^ 3 , and a;™!, 
respectively. 
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Table 3: The Detected Typical Wavelength and Frequency for Estimating Atmospheric Parameters from 


Stellar Spectra 



(a) The Detected Features for Teff 

Based ( 

Dn Basis Function rbio with the Optimal Decomposition Level 5 


Label 

TW A(A) 

IF 

label 

TW A(A) 

IF 

label 

TW A(A) 

IF 

Ti 

[3825.6,3936.4,4050.4] 

0 

T 2 

[4118.1,4237.4,4360.1] 

0 

Ts 

[4633.4,4767.6.4905.7] 

0 

T4 

[4737.0,4874.2,5015.3] 

0 

Te 

[4987.7,5132.2,5280.8] 

0 

Te 

[5061.7,5208.3,5359.2] 

0 

Tt 

[3818.6,3903.9,3991.2] 

1 

Ts 

[3998.5,4099.2,4202.4] 

1 

Tg 

[4241.3,4348.1,4457.6] 

1 

Tio 

[4737.0,4856.2,4978.5] 

1 

Til 

[4772.0,4892.2,5015.3] 

1 

Ti2 

[5061.7,5189.2,5319.9] 

1 

Ti3 

[5099.2,5227.6,5359.2] 

1 

Ti4 

[6407.7,6569.0,6734.4] 

1 

Tie 

[3839.7,3943.7,4050.4] 

2 

Tie 

[5006.1,5141.6,5280.8] 

2 

Tit 

[5080.4,5218.0,5359.2] 

2 

Tis 

[5310.1,5453.8,5601.4] 

2 

Ti9 

[3818.6,3903.9,3991.2] 

3 

T 2 O 

[4754.4,4865.2,4978.5] 

3 

T 21 

[3818.6,3875.3,3932.8] 

6 

T 22 

[3846.8,3947.3,4050.4] 

6 

T 23 

[3850.3,3934.6,4020.7] 

15 





(b) The Detected Features for log g Based 

on Basis Function coif with the Optimal Decomposition Level 6. 


Label 

TW A(A) 

IF 

label 

TW A(A) 

IF 

label 

TW A(A) 

IF 

Li 

[3818.6,4264.8,4762.1] 

0 

L 2 

[3818.6,4360.1,4977.4] 

0 

L 3 

[3818.6.4392.4,5051.3] 

0 

L 4 

[3818.6,4456.6.5202.4] 

1 

L 5 

[3894.9,4601.5,5437.5] 

1 

Le 

[3952.8,4670.9,5518.2] 

1 

L 7 

[4382.3,5178.5,6117.9] 

1 

Ls 

[5547.5,6553.9,7744.6] 

1 

Lg 

[5629.9,6652.7,7859.6] 

1 

Lio 

[3818.6,3846.8,3874.4] 

1 

Lii 

[3818.6,4264.8,4762.1] 

1 

Li2 

[3952.8,4670.9,5518.2] 

2 

Li3 

[4447.3,5255.3,6208.7] 

2 

Li4 

[4513.4,5333.3,6300.9] 

2 

Li5 

[5307.6,6271.9,7409.7] 

2 

Lie 

[7449.0,8279.4,9202.4] 

3 

Li7 

[7559.6,8340.7,9202.4] 

3 

Li8 

[3818.6,4202.4,4623.8] 

3 

Li9 

[4858.5,5741.2,6782.7] 

3 

L 20 

[7232.7,8158.3,9202.4] 

3 

L 21 

[7340.1,8218.6,9202.4] 

4 

L 22 

[3818.6,3846.8,3874.4] 

4 

L 23 

[3818.6,4424.9,5126.3] 

4 

L 24 

[3894.9,4601.5,5437.5] 

4 

L 25 

[7449.0,8279.4,9202.4] 

5 

L 2 e 

[7559.6,8340.7,9202.4] 

5 

L 27 

[3818.6,4424.9,5126.3] 

5 

L 28 

[7126.9,8098.4,9202.4] 

6 

L 29 

[3818.6,4233.5,4692.5] 

6 

L 30 

[3818.6,4295.4,4832.8] 

6 

L 31 

[7340.1,8218.6,9202.4] 

6 

L 32 

[3818.6,4171.6,4556.2] 

7 

L 33 

[3818.6,4392.4,5051.3] 

8 

L 34 

[4255.0,5028.1,5940.2] 

8 

L 35 

[5003.8,5911.5,6985.5] 

8 

L 36 

[6919.9,7979.9,9202.4] 

8 

L 37 

[7126.9,8098.4,9202.4] 

8 

L 38 

[7340.1,8218.6,9202.4] 

9 

L 39 

[3818.6,4392.4,5051.3] 

9 

L 40 

[4071.0,4810.6,5683.3] 

9 

L 41 

[4513.4,5333.3,6300.9] 

9 

L 42 

[3818.6,4140.0,4489.5] 

9 

L 43 

[6241.7,7375.6,8713.6] 

10 

L 44 

[3818.6,4295.4,4832.8] 

11 

L 45 

[4131.4,4882.0,5767.7] 

11 

L 46 

[4255.0,5028.1,5940.2] 

11 

L 47 

[7340.1,8218.6,9202.4] 

12 

L 48 

[4192.8,4954.5,5853.3] 

12 

L 49 

[4382.3,5178.5,6117.9] 

12 

L 50 

[7340.1,8218.6,9202.4] 

12 

L 51 

[7449.0,8279.4,9202.4] 

16 

L 52 

[3818.6,4328.1,4904.6] 

19 

L 53 

[3818.6,4490.6,5279.6] 

20 

L 54 

[4192.8,4954.5,5853.3] 

20 

L 55 

[7022.6,8039.0,9202.4] 

20 

Lee 

[4513.4,5333.3,6300.9] 

21 

L 57 

[3818.6,4050.4,4295.4] 

21 

L 58 

[4580.4,5412.5,6394.4] 

21 

L 59 

[3818.6,4424.9,5126.3] 

21 

Leo 

[4255.0,5028.1,5940.2] 

24 

Lei 

[3818.6,4456.6,5202.4] 

25 

Le2 

[4318.2,5101.5,6028.4] 

25 





(c) The Detected Features for [Fe/HJ Based 

on Basis Function rbio with the Optimal Decomposition Level 4. 


Label 

TW A(A) 

IF 

label 

TW A(A) 

IF 

label 

TW A(A) 

IF 

Fi 

[3882.4,3936.4,3991.2] 

0 

F 2 

[4449.4,4511.3.4574.0] 

0 

Fs 

[4565.6,4629.1,4693.5] 

0 

F 4 

[4684.9,4750.1,4816.1] 

0 

Fe 

[4789.6,4856.2,4923.8] 

0 

Fe 

[4807.3,4874.2,4942.0] 

0 

Ft 

[3896.7,3943.7,3991.2] 

1 

Fs 

[4043.0,4091.7,4141.0] 

1 

Fg 

[4118.1,4167.7,4217.9] 

1 

Fio 

[4498.8,4553.0,4607.9] 

1 

Fii 

[4532.1,4586.7,4641.9] 

1 

F 12 

[4616.4,4672.0,4728.2] 

1 

Fi3 

[4650.5,4706.5,4763.2] 

1 

Fi4 

[4667.7,4723.9,4780.8] 

1 

Fie 

[4719.5,4776.4,4833.9] 

1 

Fie 

[4789.6,4847.3,4905.7] 

1 

Fi7 

[4825.0,4883.1,4942.0] 

1 

Fi8 

[4842.8,4901.2,4960.2] 

1 

Fi9 

[4932.9,4992.3,5052.4] 

1 

F 20 

[4969.4,5029.2.5089.8] 

1 

F 21 

[5024.6,5085.1,5146.4] 

1 

F 22 

[5043.1,5103.9,5165.4] 

1 

F 23 

[5061.7,5122.7,5184.4] 

1 

F 24 

[5080.4,5141.6,5203.6] 

1 

F 25 

[5099.2,5160.6,5222.8] 

1 

F 26 

[5118.0,5179.6,5242.0] 

1 

F 27 

[5174.9,5237.2,5300.3] 

1 

F 28 

[5194.0,5256.5,5319.9] 

1 

F 29 

[5290.5,5354.3,5418.8] 

1 

Fso 

[5591.1,5658.5,5726.6] 

1 

F 31 

[5632.5,5700.3,5769.0] 

1 

F 32 

[8415.9,8517.3,8619.9] 

1 

Fss 

[8447.0,8548.7,8651.7] 

1 

F 34 

[4424.9,4482.3,4540.5] 

2 

F 35 

[4507.1,4565.6,4624.9] 

2 

Fse 

[4960.2,5024.6,5089.8] 

2 

F 37 

[4978.5,5043.1,5108.6] 

2 

F 38 

[5033.8,5099.2,5165.4] 

2 

Fsg 

[5146.4,5213.1,5280.8] 

2 

F 40 

[5203.6,5271.1,5339.5] 

2 

F 41 

[5319.9,5388.9,5458.8] 

2 

F 42 

[3889.6,3932.8,3976.5] 

3 

F 43 

[4005.9,4050.4,4095.4] 

3 

F 44 

[4202.4,4249.1,4296.4] 

3 

F 45 

[4249.1,4296.4,4344.1] 

3 

F4e 

[4344.1,4392.4,4441.2] 

3 

F 47 

[4408.6,4457.6,4507.1] 

3 

F 48 

[4424.9,4474.0,4523.8] 

3 

F 49 

[4490.6,4540.5,4590.9] 

3 

F 50 

[4798.4,4851.8,4905.7] 

3 

F 5 I 

[4869.7,4923.8,4978.5] 

3 

F 52 

[4923.8,4978.5,5033.8] 

3 

F 53 

[4942.0,4996.9,5052.4] 

3 

Fe4 

[5033.8,5089.8,5146.4] 

3 

F 55 

[5300.3,5359.2,5418.8] 

3 

Fee 

[5418.8,5479.0,5539.9] 

3 

F 57 

[5580.8,5642.9,5705.6] 

3 

Fes 

[7689.5,7775.0,7861.4] 

3 

F 59 

[5103.9,5172.5,5242.0] 

4 

Fee 

[4175.4,4227.7,4280.6] 

6 

Fei 

[4364.2,4418.8,4474.0] 

6 

F62 

[4494.7,4550.9,4607.9] 

6 

Fes 

[4856.2,4917.0,4978.5] 

6 

Fe4 

[5246.9,5312.5,5379.0] 

6 

Fee 

[4129.5,4173.5,4217.9] 

7 

Fee 

[4284.5,4330.1,4376.2] 

7 

Fe? 

[5132.2,5186.8,5242.0] 

7 

Fes 

[5305.2,5361.7,5418.8] 

7 





Note. TW: typical wavelength position represented by a three-dimensional vector [a, b, c], where a, b, c are respectively 
the starting wavelength, central wavelength, ending wavelength, and logiob = (logioa -|- logioc)/2. IF: Index of sub-bands 
with different frequencies. In (a), 0.56% of the 4096 wavelet packet components/coefficients are extracted to estimate Teff; 
(b) 1.18% of the 5248 wavelet packet components/coefficients are extracted to estimate log g\ (c) 1.72% of the 3952 wavelet 
packet components/coefficients are extracted to estimate [Fe/H]. Selection of basis function and the decomposition level 
are discussed in Section l6.ll 
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[Fe/H]. This process is referred to as a feature sel¬ 
ection problem in machine learning. 


The high-frequency WP-decomposed components 
usually have a larger probability of being affected 
by noise than the low-frequency components. In the 
literature, therefore, features are usually selected by 
throwing away as noise those components with fre- 
quencies larger than the preset threshold ( Ln et all 
I 2 OI 3 II . Assessing this threshold is subjective. Fur¬ 
thermore, apart from noise, there exists a high level 
of redundancy in a stellar spectrum when estimating 


atmospheric parameters ( Li et al. 2014l l. 


Therefore, we analyze the correlation between 
WP components and the atmospheric parameters to 
be estimated and detect representative spectral fea¬ 
tures using LASSO(LARS)bs fSection lT^ . The WP 
components selected as useful features are presented 
in Table [3] and Fig. [SJ and more technical details are 
discussed in Section [6l 


Visualization of the features. Based on the re¬ 
sults in Table [3l a spectrum x should be decom¬ 
posed to the hfth-level x[5] to estimate Tgff. A vec¬ 
tor a [5] can be constructed from a: [5] by keeping the 
elements of a; [5] corresponding to the features in Ta- 
ble|3](a) but reset the other elements of a:[5] to zero. 
Thus, we can visualize the features of a: in a spec¬ 
tral space through WP reconstruction wprec{s[5]) 
ISection [4.2.31 Fig. |5(b)[ ). Similarly, the features in 
Tables [3] (b) and (c) can also be visualized in the 
spectral space (Fig. |5(d)[ Fig. |5(f)[ ) through WP 
reconstruction. 


We find that the extracted features are a subset 
of WP components/coefficients in some lower sub¬ 
bands (Tabled Fig. [6]). In other words, not only are 
some of the sub-bands with higher frequency are in¬ 
effective but also many components in the sub-bands 
with lower frequency appear redundant. Further dis¬ 
cussion and the corresponding results are presented 
in Section [6l 


To estimate Tgff, a spectrum is decomposed into 
2® = 32 sub-bands with a frequency index from 0 
to 31; there exist 120 components in each sub-band, 
and the detected features come only from sub-bands 
0, 1, 2, 3, 6, and 16: this means that 99.33% of the 
components/coefficients are redundancy or noise in 
each sub-band (Fig. 6(a)). Similarly, to estimate 
log g, a spectrum is decomposed into 64 sub-bands, 
more than 85% of the WP components are redun¬ 
dancy and noise in each sub-band (Fig. |6(b)[). To 


estimate [Fe/H], a spectrum is decomposed into 16 
sub-bands, and more than 88.7% of the WP com¬ 
ponents are redundancy and noise in each sub-band 
(Fig. 6(c)). An interesting phenomenon is that all 
components in the lowest frequency sub-band are re¬ 
dundancy or noise when estimating log g. Therefore, 
the effectiveness of a WP component depends both 
on its frequency and on its wavelength (Table [H]). 

Based on the detected features in Table [31 three 
atmospheric parameter estimate models can be es¬ 
tablished using OLS (Equations ([3]) and (|T])). The 
coefficients of these models are given in Table ID 
They quantify the association between the detected 
spectral features and the atmospheric parameter to 
be estimated. As already mentioned in Section l3.ll 
these coefficients can be interpreted as the average 
effect of a one-unit increase in a spectral feature 
(I, lames et al. l2013h . For example, the coefficient 
wi is 0.2379 in the Tgff estimate model (Table |1]); 
therefore, if the spectral feature T 1 increases one unit 
with all other features {Tj,j = 2, • • • , 23} remaining 
fixed, then the effective temperature log Tgff will in¬ 
crease 0.2379. 


4.4. Characteristic - Good Interpretability 


Due to the characteristic of the time-frequency lo¬ 
calization of a wavelet basis function, every detected 
feature has a specihc wavelength position (Table [31 
Fig. | 6 (d)[ Fig. | 6 (e)| and Fig. 6 (f)), which helps to 
trace back the physical effective factors and evaluate 
their contributions to the atmospheric parameter es¬ 
timate from stellar spectra (Table [4]). For example, 
H.y is a sensitive line to surface temperature (Tg in 
Tables Island HI, is sensitive to both surface tem¬ 
perature and gravity (T 14 and Lg in Tables [3] and 
ED, Call K is sensitive to both surface temperature 
and metallicity (Ti, T 23 , F 42 in Tables [3] and [4]), 
and Hi is sensitive to both surface temperature and 
metallicity (Tg and Fg in Tables [3] and [4]) . 


Note, however, that the selected features in Ta¬ 
ble [3] may span a somewhat larger wavelength width 
than traditionally used for stellar absorption lines. 
Thus, it could be asked whether some selected fea¬ 
tures may physically correspond to spectral blends 
rather than to single lines, which would explain 
why some wavelength-identihed features unexpect¬ 
edly appear sensitive to an atmospheric parameter: 
this is the case in Hi for example, which should not, 
by itself and considered alone, be sensitive to metal¬ 
licity. We also underline that the present study does 
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Table 4: Coefficients of the Atmospheric Parameter Estimation Model learned by OLS from SDSS spectra 
(More Details of the Experiment are Presented in Section lOT) 


(a) Detected Features for Teff Based on Wavelet Basis Function rbio with the Optimal Decomposition Level 5. 


Label 

Ti 

Ts 

Ti5 

T 22 

Wi 

0.2379 

0.4558 

0.2825 

0.2366 

label 

T 2 

T 9 

Tie 

T 23 

Wi 

0.4505 

0.7899 

0.5646 

0.4507 

label 

Ts 

Tio 

Ti7 

Wi 

0.4819 

1.0640 

0.5911 

label 

T 4 

Til 

Ti8 

Wi 

-1.8345 

-0.9495 

-0.4539 

label 

Ts 

Ti2 

Ti9 

Wi 

1.0740 

0.8358 

0.3262 

label 

Te 

Ti3 

T 20 

Wi 

0.7236 

0.8650 

-0.9225 

label 

T 7 

Ti4 

T 21 

Wi 

0.3264 

1.5004 

-0.2100 

(b) Detected Features for log g Based on Wavelet Basis Function coif with the Optimal Decomposition Level 6. 

Label 

Li 

Lg 

Li5 

L 22 

L 29 

L 36 

L 43 

L 50 

L 57 

Wi 

-14.9500 

124.5594 

20.5712 

-6.9629 

16.4676 

-43.4952 

-19.2899 

-16.5275 

-3.6372 

label 

L 2 

Lq 

Lie 

L 23 

L 30 

L 37 

L 44 

L 51 

L 58 

Wi 

11.9097 

-23.4959 

26.5010 

-10.0933 

-9.1541 

-11.2162 

-9.5705 

-8.4650 

-28.5637 

label 

L 3 

Lio 

Li7 

L 24 

L 31 

L 38 

L 45 

L 52 

L 59 

Wi 

-15.9727 

4.2752 

32.2221 

-11.4390 

-47.8560 

-12.9840 

7.2348 

-14.5751 

-8.3460 

label 

L 4 

Lii 

Li8 

L 25 

L 32 

L 39 

L4e 

L 53 

Leo 

Wi 

-29.8595 

9.3749 

11.0103 

-26.8052 

-7.0705 

8.6831 

-12.2654 

-19.9664 

38.7862 

label 

L 5 

Li2 

Li9 

L 2 e 

L 33 

L 40 

L 47 

L 54 

Lei 

Wi 

20.9904 

-11.0132 

-22.3564 

-25.4320 

-18.4181 

13.4435 

16.4605 

-28.4143 

-15.1417 

label 

Le 

Li3 

L 20 

L 27 

L 34 

L 4 I 

L 48 

L 55 

Le2 

Wi 

-12.1664 

-26.4751 

11.6441 

9.7027 

-19.0691 

-13.0983 

-14.0761 

-17.2212 

-23.3872 

label 

L 7 

Li4 

L 2 I 

L 28 

L 35 

L 42 

L 49 

L 56 

Wi 

48.9051 

-35.4924 

-14.6373 

-21.6698 

-19.7537 

-5.3274 

34.1498 

18.7294 

(c) Detected Features for [Fe/HJ Based on Wavelet Basis Function rbio with the Optimal Decomposition Level 4. 

Label 

Wi 

label 

Wi 

label 

Wi 

label 

Wi 

label 

Wi 

label 

Wi 

label 

Wi 

Fi 

-15.9393 

F 2 

14.4512 

F 3 

10.0133 

F 4 

14.5021 

Fs 

-36.4275 

Fe 

-22.7435 

F 7 

-20.0956 

Fs 

-3.5740 

F 9 

-4.2593 

Fio 

15.7326 

Fii 

15.0760 

F 12 

9.5922 

Fi3 

15.1001 

Fi4 

-10.5834 

Fi5 

-16.4978 

Fie 

-19.8914 

Fi7 

-6.1936 

Fi8 

-13.9105 

Fi9 

- 9.9439 

F 20 

-16.1353 

F 2 I 

10.6011 

F 22 

14.8693 

F 23 

-6.8806 

F 24 

14.3556 

F 25 

-19.2167 

F2e 

-43.5510 

F 27 

3.8668 

F 28 

-19.1643 

F 29 

-7.0540 

F 30 

13.4940 

F 31 

-13.5002 

F 32 

-8.6885 

F 33 

9.5827 

F 34 

17.6601 

F 35 

5.7138 

F 36 

-12.3663 

F 37 

5.2848 

F 38 

6.5740 

F 39 

-21.8866 

F 40 

-23.7674 

F 41 

-16.2051 

F 42 

-10.6150 

F 43 

4.5351 

F 44 

-5.7992 

F 45 

7.4311 

F4e 

1.2759 

F 47 

-8.2072 

F 48 

5.1707 

F 49 

1.6828 

F 50 

13.3047 

F 51 

-11.3030 

F 52 

8.5673 

F 53 

7.0954 

F 54 

6.1268 

F 55 

15.5904 

Fse 

-5.3280 

F 57 

-16.7121 

F 58 

-10.0843 

F 59 

-17.2424 

Feo 

-2.3355 

Fei 

-4.8779 

Fe2 

-8.4915 

Fes 

10.0789 

F64 

15.6214 

Fes 

4.5253 

Fee 

-6.8905 

Fe? 

17.1603 

Fes 

-8.5783 






Note. More details of the experiment are presented in Section l5. II The coefficients predict the average effect of the corresponding 
spectral feature on the atmospheric parameter to be estimated. The labels of spectral features are defined in Table 


not take into account the effects of spectral resolu- K line (Ti, T 23 ) is only weakly pertinent for stars 

tion on the effectiveness of the feature selection. hotter than 6000 K. 


4.5. Physical Dependence of the Detected 
Features and Their Contributions 


5. ESTIMATING THE ATMOSPHERIC 
PARAMETERS 


The detected features and their contributions de¬ 
pend on the range of atmospheric parameters to be 
investigated. The following examples pertain to the 
effective temperature determination. Let us split the 
SDSS training spectra set (10,000 spectra) into the 4 
following subsets based on the effective temperature 
derived by the SDSS SSPP: 

Si', the spectra with Teff < 5200 K, 

S' 2 : the spectra with 5200 K < T^ft < 6000 K, 

S 3 : the spectra with 6000 K < T^ff < 7500 K, 

S' 4 : the spectra with Teff > 7500 K. 

Based on the features selected by LASSO(LARS)bs 
in Table [3] (a), we are led to four models Mi, M 2 , 
M 3 , and M 4 corresponding to the four training sub¬ 
sets Si, S 2 , S 3 , and S' 4 . The coefficients of these four 
models are presented in Table [5] It is obvious that 
the contribution/coefficient of each detected feature 
Ti depends on the range of effective temperatures: 
for example, the feature associated with the Call 


5.1. Performance on SDSS Spectra 


Based on the detected features in Table [31 we 
can estimate the atmospheric parameters using the 
linear model defined in Equation o, which can 
be learned from the training set (Section 12.11) by 
the OLS method in Equation ([41) and SVR inethod 
with a linear ker nel (SVR;) ( Schokopf et al. 20021: 
ISmola et al.ll2003) . The Performance of the test set 
1 Section [3fT|) is presented in TableUKa). When using 
SVR;, there is a regularization parameter C that has 
to be preset (IChyg and Lin II2OOII : ISchbkopf et al 


2002 ; Smola et al.l 2004 ), and we optimized this pa¬ 


rameter using the validation setfSection 12.11) . 


On the test set of 30,000 SDSS spectra, the for¬ 
mal MAE consistencies of the proposed scheme are 
0.0062 dex for log Teff (83 K for Teff), 0.2345 dex 
for log g, and 0.1564 dex for [Fe/H], where the MAE 
evaluation method is defined in Equation (|3]). There- 


13 






























1200 

1000 

800 

600 

400 

200 

0 


-2i 




Tgff -9667K 


•'** 1 * 1 "" ' 1 1 1 * 111 " ' 


irr 









%%00 


4000 5000 6000 7000 8000 9000 1000C 

(a) Five spectra with different Teff. 
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(e) Five spectra with different [Fe/H]. (f) Features of the spectra in Fig. |5(e)| 


Fig. 5.— Visualization of the detected features fSection 1731) . Fig. |5(b)[ Fig. |5(d)[ and Fig. |5(f)| present the 
features in Table |3] (a), Table [3] (b), and Table |3](c) for the spectra drawn, respectively, in Fig. |5(a)[ Fig. 
5(c)[ and Fig. 5(e) For example, the curve labeled with Tgff = 9667K in Fig . |5(b) is the visualization of 
the features in Table [3] (a) for the spectrum labeled with Teff = 9667K in Fig. 5(a) 
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Table 5: Dependences of the Contribution of the Detected Features on the Range of Effective Temperature 


Label 

Wi {Ml) 

Wi{M2} 

■Wii-Ms) 

'Wi{M4) 

label 

lUi (Ml) 

Wi{M2) 

Wi(M3 ) 

Wi (M 4 ) 

Ti 

0.2899 

0.0951 

0.0005 

0.0672 

T 2 

0.1103 

0.3226 

0.3159 

0.0838 

Ts 

0.9069 

0.8461 

0.6738 

0.6006 

T 4 

-0.7189 

-2.6379 

-2.1906 

-1.1374 

Ts 

-0.5341 

0.8141 

0.8674 

1.3047 

Te 

-0.4443 

1.0845 

0.6762 

0.1175 

T7 

-0.2309 

0.2490 

0.4674 

-0.1354 

Ts 

0.3543 

0.4148 

0.7816 

0.5406 

Tg 

0.6276 

0.7579 

0.9733 

0.1057 

Tio 

0.9216 

1.1588 

1.3794 

-0.0292 

Til 

-3.0122 

-0.5011 

-0.5016 

-0.4103 

Ti2 

0.5619 

-0.0543 

0.6089 

2.1177 

Ti3 

-0.0779 

0.1908 

0.7772 

1.3919 

Ti4 

-1.3688 

0.7778 

2.3492 

1.2935 

Ti5 

0.8306 

0.1825 

0.2503 

0.1015 

Tie 

-0.5552 

0.0912 

0.7270 

1.2507 

Ti7 

-0.9458 

0.4172 

0.1666 

0.4198 

Tis 

0.6194 

-0.1032 

-0.8771 

-0.0588 

Ti9 

0.0948 

0.1530 

0.2582 

0.0372 

T 2 O 

1.2295 

-0.3203 

-2.1724 

-0.7095 

T 21 

-0.4782 

-0.0629 

-0.0731 

-0.0845 

T 22 

0.7560 

-0.0193 

0.0556 

-0.0828 

T 23 

0.8643 

-0.1538 

0.0331 

0.0215 







Note. Ml, M 2 , M 3 , and M 4 are defined in Section |4.5l The labels of the spectral features are defined in 
Table[^ Wi{Mj) represents the coefficient of model Mj. 
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(a) Features for Teff 
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index of wavelet packet sub-band 
(b) Features for log g 


index of wavelet packet sub-band 

(c) Features for [Fe/H] 



Wavelength (A) of the detected features 


Wavelength (A) of the detected features 
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(d) Features for Teff 


(e) Features for log g 


(f) Features for [Fe/H] 


Fig. 6.— Distribution of the detected features in Table [H (a) and (d) There are 120 wavelet packet 
components in each sub-band and more than 93.33% of the components are redundancy and noise in each 
sub-band, (b) and (e) There are 60 wavelet packet components in each sub-band and more than 85% of 
the components are redundancy and noise in each sub-band, (c) and (f) There are 239 wavelet packet 
components in each sub-band and more than 88.7% of the components are redundancy and noise in each 
sub-band.(a) Features for Teff; (b) Features for log g; (c) Features for [Fe/H]; (d) Features for Tgff; (e) 
Features for log g; and (f) Features for [Fe/H]; 
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fore, the detected features provide excellent linear 
support for estimating atmospheric parameters T^ft , 
log g and [Fe/H]. 

In related work in the literature, the authors 
use various performance evaluation methods. In or¬ 
der to better compare with those sources, we have 
also made a performance evaluation of the proposed 
scheme based on measures of ME and SD (defined 
in equations d?]) and dH])): the results are presented 
in Table [5] (a). Direct comparisons with published 
work are given in Section 15.31 


5.2. Performance on Synthetic Spectra with 
Ground-truth 

The proposed scheme is also evaluated on syn¬ 
thetic spectra built from theoretical parameters. 
The synthetic data set is described in Section 12.21 
This experiment shares the same parameters as the 
experiment using SDSS data to detect features with 
LASSO(LARS)bs - m is 100, are 23 for Teff, 62 
for log g, and 68 for [Fe/H]. 

For the test set of 8500 synthetic spectra, the 
MAE accuracies when the OLS estimation is used 
are 0.0022 dex for log Tgff (32 K for Teff), 0.0337 
dex for log g, and 0.0268 dex for [Fe/H]. More re¬ 
sults are presented in Table |6] (b). 

When experimenting with real spectra, results 
usually are influenced by noise and calibration de¬ 
fects. Therefore, SVR; are slightly more accurate 
than OLS because it incorporates a regularization 
technique (Table [6] (a)). For synthetic spectra in 
which no external disturbances occur, OLS are more 
accurate than SVR/ (Table [6] (b)). 


5.3. Comparison with Previous Works 


The proposed scheme is tested on both real spec¬ 
tra from SDSS and synthetic spectra derived from 
Kurucz’s NEWODF models ( Castelli et al. 2003 1. 
Using large spectral samples from SDSS and syn¬ 
thetic stellar models, several authors have attempted 
to obtain accurate estimates of atmospheric parame¬ 
ters along similar scenarios. These works can be 
classified into two groups based on the estimation 
methods: linear schemes and nonlinear schemes. 


1. Nonlinear methods: 


Re Fiorentin et al 


( 2007t l investigated the stellar parameter es¬ 
timation problem based on Principal Compo¬ 
nent Analysis (PCA) and nonlinear artificial 


neural networks (ANN) and obtained MAE ac¬ 
curacies 0.0126 dex for log Teff, 0.3644 dex for 
log g, and 0.1949 dex for [Fe/H] in a test set of 
19,000 stellar spectra from SDSS. Ijofre et ^ 


( 2010ll applied a nonlinear MAy method to 
a sample set of 17,274 spectra of metal-poor 
dwarf stars from SDSS/SEGUE and estimated 
the effective temperature, log g, and the metal- 
licity with respective average accuracies of 130 
K (ME) 0.5 d ex (ME), and 0.24 dex (ME). 


iLi et al. ( 2014 1 used a LASSO scheme coupled 


with nonlinear SVRg (Support Vector Regres¬ 
sion with a Gaussian kernel) and reached MAE 
accuracies of 0.0075 dex for log Tgff (101.6 K 
for Teff ), 0.1896 dex for log g, and 0.1821 for 
[Fe/H]. 


2. Linear methods: iTan et al. (|2013[1 used a 
Lick line index of SDSS spectra and a linear 
regression method: the SD accuracies are 196.5 
K for Teff, 0.596 dex for log g, and 0.466 dex 


for [Fe/H]. iLi et al. (2014) also studied the 


physical parameter estimation problem using 
LASSO and the SVR/ with MAE accuracies 
0.0342 dex for log Teff, 0.2534 dex for log g, 
and 0.3235 for [Fe/H]. 


Finally, iRe Fiorentin et ah (2007), using a test 
set of 908 synthetic spec tra calculated from Ku¬ 
rucz’s NEWODF models ( Castelli et al. 2003), ap¬ 
plied PCA and nonlinear ANN, and obtained test 
accuracies of 0.0030 dex for log Tgff, 0.0245 dex 
for log q, and 0 . 0269 dex for [Fe/H] (Table 1, 


Re Fiorentin et al.ll2007^ . 


The literature results are summarized in Table 
[7l It can be seen that the scheme proposed in the 
present paper provides excellent performance when 
estimating stellar atmospheric parameters. 


6. MORE TECHNICAL DISCUSSIONS 
6.1. Configuration for WPD 

We will now investigate the influence of the sel¬ 
ection of wavelet basis functions and WPD level on 
atmospheric parameter estimates using the 10,000 
SDSS spectra of the validation set. The considered 
basis functions include Biorthogonal basis (bior), 
Coiflets (coif), Daubechies basis (db), Haar (haar), 
Reve rseBior (rbio), and Symlets (sym) ( Mallat I 
200911 . 
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Table 6: Performance of the proposed scheme 



(a) Performance on 

SDSS Test Set Consisting of 30,000 Stellar Spectra 


Estimation Method Evaluation Method log Teff (dex) 

Te„ (K) 

log g (dex) 

[Fe/HJ(dex) 


MAE 

0.0062 

82.94 

0.2345 

0.1564 

OLS 

ME 

0.0002 

2.769 

-0.0219 

-0.0003 


SD 

0.0096 

135.9 

0.3297 

0.2196 


MAE 

0.0060 

80.67 

0.2225 

0.1545 

SVRi 

ME 

0.0002 

4.783 

-0.0762 

-0.0012 


SD 

0.0096 

136.6 

0.3298 

0.2177 


(b) Performance 

on Synthetic Test Set Consisting of 8500 Spectra 


Estimation Method Evaluation Method log Teff (dex) 

Tef, (K) 

log g (dex) 

[Fe/H] (dex) 


MAE 

0.0022 

31.69 

0.0337 

0.0268 

OLS 

ME 

0.0003 

2.823 

-0.0004 

0.0049 


SD 

0.0029 

41.45 

0.0687 

0.0371 


MAE 

0.0031 

43.74 

0.0611 

0.0359 

SVRi 

ME 

-0.0001 

-2.886 

0.0025 

0.0024 


SD 

0.0040 

58.74 

0.0966 

0.0514 


Note. The number of extracted features is 23 for Teff, 62 for log p, and 68 for [Fe/H]. OLS (Ordinary Least 
Squares): linear least squares regression, SVR^: Support Vector machine Regression with a linear kernel. 


Table 7: Comparing the Proposed Scheme with Previous Works in Similar Scenarios 


(a) Comparison with SDSS Data Set. 

Estimation Method 

Evaluation Method 

log Te„ (dex) 

Teff (K) 

log g (dex) 

[Fe/HJ (dex) 

Size of Test Set 


MAE 

0.0062 

82.94 

0.2345 

0.1564 


Linear:OLS 

ME 

0.0002 

2.769 

-0.0219 

-0.0003 

30,000 


SD 

0.0096 

135.9 

0.3297 

0.2196 


Nonlinear:ANN [1] 

MAE 

0.0126 

- 

0.3644 

0.1949 

19,000 

Nonlinear:MAx [2] 

ME 

- 

130 

0.5 

0.24 

17,274 

Nonlinear:SVRG [3] 

MAE 

0.007458 

101.610 

0.189557 

0.182060 

20,000 

Linear:OLS [4] 

SD 

- 

196.473 

0.596 

0.466 

9048 

Linear:SVRz [3] 

MAE 

0.034152 

- 

0.253363 

0.323512 

20,000 

(b) Comparison with Synthetic Data Set Derived from Kurucz’s 

NEWODF Models fCyastelli et al. 20031 

Estimation Method 

Evaluation Method 

log Teff (dex) 

Teff (K) 

log g (dex) 

[Fe/HJ (dex) 

Size of Test Set 

Linear:OLS 

MAE 

0.0022 

31.70 

0.0337 

0.0268 

30,000 

Nonlinear:ANN [1] 

MAE 

0.0030 

- 

0.0245 

0.0269 

19,000 


Note. OLS (Ordinary Least Squares): linear least squares regression, SVR^: Support Vector machine Regression with a 
linear kernel, SVRg: Support Vector machine Regressio n wi th a Gaussi a n ker nel, ANN: Art i ficial neu ral networks , MAx i 
MAssive compression of • fl URe Fiorentin et al.l (120071) . [2 HJofre et al.l ll2010f) . [3 HLi et al. |||2014D . [4] |Tan et al. |||2013D . 
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Consider S^b as a set of basis functions 


Swb = {bior, coif, db, haar, rbio, sym}, 

Si as a set of options for the WPD level 

= {3,4,5,6, 7}, 

and Sk as a set of options for ko in the proposed 
algorithm LASSO(LARS)bs in Section 

The configuration optimization problem can be 
formulated as the search for 


min MAAlwb, level, fco, ap), 

wbG Swb, levels Sj,feo^‘5fc 


(15) 


where ap =Teff, log g or [Fe/H], with MAEs being 
the evaluation of the predicted error for a specific 
configuration of wb, level, ko, and ap . 

We initially select m = 100 features using the 
LASSO (LARS)bs scheme and let 

Sk = {100,99,98,--- ,5}. 


To obtain the optimal decomposition level, let 


MAEujbifevel, ko, ap) 

= min MAE{wb, level, ko,ap). 

wb£S-ujb 


(16) 


The relationship between MAE^^i, and ko are inves¬ 
tigated on SDSS spectra for every combination of 
level £ Si and ap = {Tsft,log g, [Fe/H]}, and the 
experimental results are presented in Fig. |7(a)[ Fig. 
|7(b)| and Fig. 7(c) The optimal WPD levels appear 
to be 5 for Tgff, 6 for log g, and 4 for [Fe/H] based 
on the criterion defined in Equation (I15p . 

Once the optimal decomposition level has been 
found, the performances of various basis functions 
are investigated and the associated optimal num¬ 
ber of features can be derived. The experimental 
results are presented in Fig. |7(d)[ Fig. 7(e) and Fig. 
|7(f)[ Based on the criterion defined in Equation (fT^ . 
we find that the optimal basis functions and feature 
numbers are, respectively, rbio and 23 for Teff, coif 
and 62 for log g, and rbio and 68 for [Fe/H]. 


6.2. Sufficiency and Compactness 

We now explore the sufficiency of the set of 
LASSO(LARS)bs detected features as defined in Ta¬ 
ble m that is, we study whether the accuracy of 
the atmospheric parameter estimation can be sig¬ 
nificantly improved by appending some additional 
components of the WPD. 


To do this, we conduct six experiments by 
appending the components of WPD having the 
lowest frequency or the highest frequency in the 
LASSO(LARS)bs feature set. The corresponding 
results are presented in rows (3) and (4) of Ta¬ 
ble E For convenience, the performance of the 
LASSO(LARS)bs features is repeated in row (1) of 
Table H 


It appears that the performance gain is trivial af¬ 
ter adding more components to the LASSO(LARS)bs 
features. The WP components with the lowest fre¬ 
quency are the traditional choice of spectra l features 


for e stimating atmospheric parameters (|Lu et al 


20131 1. If we add them to {Li}, the amount of 


features will increases from 62 to 144 (increase 
132.26%), but the MAEs can only decrease 0.0101 
(4.3%). Adding them to {Fi}, the amount of fea¬ 
tures increases 354.41% and the MAE only decrease 
6.65%. On the other hand, if we add the compo¬ 
nents with the highest frequency to the features in 
Table [31 it is shown that the performance decreases 
(row (4) in Table [8]). Therefore, we conclude that 
the detected features in Table [3] are quite sufficient. 


Suppose S'! and S2 are two sets of features, then 
11 S'! 11 and ||S'2|| represent the number of features in 
S'! and S2 respectively. If ||S'1|| < ||<S'2||, then we 
will say that SI is more compact than S2. 

We also investigate the performance of the tra¬ 
ditional choice of the components with the lowest 
frequency, and the results are presented in row (2) 
in Table |8l It is shown that the accuracy and com¬ 
pactness of the features in Table [3] are all better than 
those of the components with the lowest frequency. 


6.3. Redundancy - Positive or Negative? 

Redundancy is the duplication of some compo¬ 
nents in a system. In previous sections, the removal 
of redundant features is discussed as positive for per¬ 
formance improvement, but should not redundancy 
be useful in the presence of noise? 

Potential advantages of redundancy In the¬ 
ory, redundancy should help to remove noise or at 
least reduce the negative effects of noise. The use¬ 
fulness depends on the relationship between compo¬ 
nents and their duplicates. Unfortunately, it is diffic¬ 
ult to uncover these relationships. Multiple indepen¬ 
dent components are usually present, and the dupli¬ 
cates of these different components are often mixed 
up and hard to identify, which makes an effective use 
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Number of features ko Number of features ko Number of featmes ko 


(a) For Teff (b) For log g (c) For [Fe/H] 



(d) For Te„ (e) For log g (f) For [Fe/H] 


Fig. 7.— Optimize the configuration for wavelet packet decomposition. (a),(b), and (c) Selection of Wavelet 
packet decomposition level: the optimal decomposition levels are 5 for Tgff, 6 for log g, and 4 for [Fe/H]. 
(d), (e), and (f) selection of wavelet basis function: the optimal basis functions and feature numbers are, 
respectively, rbio and 23 for Teff, coif and 62 for log g, and rbio and 68 for [Fe/H] based on the criterion in 
Equation (fT5|l . These experiments are conducted on the 10,000 SDSS spectra of the validation set (Section 
12.1[) . (a) For Teff, (b) for log g, (c) for [Fe/H], (d) for Teff, (e) for log g, (f) for [Fe/H]. 


Table 8: Sufficiency and Compactness of the Detected Features Identified in Table [3] 


label 

log Teff 

log g 

IFe/H] 

(1) 

{Ti}:23 

0.0062 

{Li}:62 

0.2351 

{Fi}:68 

0.1564 

(2) 

WP(rbio,5,0):128 

0.0068 

WP(coif,6,0):82 

0.2482 

WP(rbio,4,0):247 

0.1573 

(3) 

WP(rbio,5,0)-|-{Ti}:145 

0.0062 

WP(coif,6,0)+{Li}:144 

0.2250 

WP(rbio,4,0) +{Ti}:309 

0.1460 

(4) 

WP(rbio,5,31) + {Ti}:151 

0.0063 

WP(coif,6.63) + {Li}:144 

0.2364 

WP(rbio,4,15)-|-{Fi}:315 

0.1608 


Note. In these experiments, the atmospheric parameters are estimated by OLS method and the performance is evaluated 
by MAEs. WP(ii;p, i, j): Decompose a spectrum by wavelet packet transform based on basis wp, and take the components 
in the jth sub-band at level i as features. {Ti}, {Li}, {-Fi} represent the features {Ti, i — 1, ■ ■ ■ , 23}, {Li, i — 1, ■ ■ ■ , 62}, 
{Fi, ,i — 1, ■ ‘ ‘ , 68} in Table respectively. The number behind represent the number of selected features in a specific 
experiment. 
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of redundancy very difficult to implement. 

Potential disadvantages The existence of redun¬ 
dancy, in addition to an increase in computational 
burden, usually destroys or reduces the quality of 
investigations based on computer algorithms. The 
learning process of computer algorithms can be re¬ 
garded as some kind of vote assessment. In appli¬ 
cations, the existing components usually differ from 
each other in the limit of redundancy, but usually the 
amount of redundancy for a specific component re¬ 
mains unknown. Thus, multiple components in data 
invisibly assume different number of votes, which 
usually results in erroneous evaluation results and 
reduces the quality of learning. 

7. CONCLUSION 

We propose a scheme LASSO (LARS)bs to extract 
linearly supporting (LSU) features from stellar spec¬ 
tra to estimate the atmospheric parameters Tgff, 
log g, and [Fe/H]. ‘Linearly supporting’ means that 
the atmospheric parameters can be accurately es¬ 
timated from the extracted features using a linear 
model. One prominent characteristic of the proposed 
scheme is the ability to directly evaluate the contri¬ 
bution of the detected features to the estimate of the 
atmospheric parameters (Table S]) and to trace back 
the physical interpretation of the features (Section 

m. 

The basic idea of this work is that the effective¬ 
ness of a data component is sensitive to both wave¬ 
length and frequency. Therefore, we decompose the 
stellar spectra using WPs before detecting features. 
It is shown that at most 1.72% of the data com¬ 
ponents are necessary features for estimating atmo¬ 
spheric parameters (Table |3]), and LASSO(LARS)bs 
can effectively delete the redundancy and noise (Fig. 
El). The detected features are sparse. 

Due to the time-frequency localization of WPD, 
we can derive the wavelength of the detected features 
(in the spectral space; Fig. |6(d)[ Fig. |6(e)[ Fig. 
|6(f)| and Table E]). The feature wavelength position 
helps us to identify the selected features with specific 
spectral lines, which leads to an understanding of the 
physical significance of the detected features (Section 

m. 

The accuracies/consistencies of the proposed 


®In theory, every component should ideally only take one effec- 
tive vote for fairness in a condition. 


scheme LASSO(LARS)bs + OLS with respect to the 
pre-estimation by SSPP of SDSS for real spectra and 
with respect to exact input atmospheric parameters 
in stellar models are evaluated through three statis¬ 
tical indicators and compared with previous similar 
works in the literature. The proposed scheme is 
shown to provide excellent performances, both on 
real (noisy) spectra and on synthetic stellar models, 
and therefore, the detected features provided excel¬ 
lent linear support when estimating the atmospheric 
parameters Teff, log g and [Fe/H]. 
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